5.8.1 Prediction and residual values

All regression variants found in microdata.no have associated commands that generate, among other things, residual and prediction values. These are values that can be used to analyze the data spread and for testing regression models. Prediction values can also be used as input for further analyses.

The commands have the same name as the associated regression command plus -predict.

Syntax:

mlogit-predict <variable> <variable list> [if <condition>] [, <options>]

The variables are specified in the same way as for the corresponding regression model run with the mlogit command.

The following values can be retrieved: Probability values and prediction values

You decide which values you want to generate through the use of options. The result of the runs is a set of variables that contain the different values. By default, the former value type is generated, but it is still recommended to specify value type through options as this makes you able to create names for the generated variables inside parentheses as shown in the syntax example below. If you run several predict commands, you have to create new names for the automatically generated variables.

Syntax example:

mlogit-predict wagecat age man highwealth, predicted(pred6) probabilities(prob6)

The automatically generated variables can be used as input for further analyses or to be displayed graphically. Current graphical commands are hexbin and histogram. By running a histogram on the residual variable, one can check whether the residuals are normally distributed. The hexbin command can also be used to create anonymized scatter plots where one combines two sets of values.

For more details, it is recommended to use the help mlogit-predict command.

$\rhd$ Example: Prediction and residual values analysis